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A  STA'I  IS'I'ICAI.  THfiORY  OP  COMPUTER  PROGRAM  TESTING 


Arthur  f;.  Laemmel 
Abstract 


Most  computer  programs  are  tested  with  some  of  the  possible  sets  of 
input  data,  but  few  can  be  tested  with  all  possible  input  data.  Passing 
such  a  partial  test  cannot  insure  that  the  program  will  always  function 
correctly;  we  can  perhaps  say  that  the  probability  of  failure  is  less  than  a 
specified  amount.  It  is  the  purpose  of  this  report  to  derive  several  for¬ 
mulas  for  the  above  probability.  Also,  in  some  cases  an  optimum  testing 
strategy  can  be  derived  which  minimizes  that  probability  of  failure. 


l.U  Itiiiuuuclion 


I’hcre  (ire  three  nuMhods  tor  iiioximiziriLj  the  reliabjlity  ot  compute,' 
progr'(ims:  (1)  use  a  syslemalic  procedure  which  makes  it  difficult  for 
errors  to  occur  during  the  writing  of  the  progrcim,  (2)  prove  that  the- 
program  works  correctly  by  some  foi'mal  or  automiitic  process,  and  (.i)  test 
and  debug  ihe  program  thoroughly  befor'c  passing  il  on  to  tne  user.  Most 
programmers  will  use  some  combination  of  these  methods,  and  in  fact  some 
procedures  involve  elemi'nlc.  of  more  than  one  melhod.  Ihe  [ireseni  report 
emphasizes  testing,  but  lirst  some  remarks  wiii  be  made  about  program 
writing  and  proving. 

(1)  Writing  Correct  jh'ograms:  U'hile  no  uiic  intentionally  inserts 
errors  m'  Tifs  program,  it  is  undouLtniiy  triu  th.ji  many 
people  would  produce  more  reliable  prugrame  wiin  less  eftort 
if  they  were  taught  better  programming  tcehriu;ues.  How¬ 
ever,  it  seems  obvious  that  the  averagr'  ta'ograinmer  should 
test  his  programs  even  if  ne  exercises  the  maximum  ol  tare 
and  uses  thi  best  techniques. 

hioving  rroyram  Correctnjess:  There  are  sever, il  reasons 

why  <1  formal  jo'rocedure  for  proving  progi'am  correctness 
Carnot  be  ivlied  on  to  insure  absence  of  errors  in  practicol 
situilions;  (i)  a  uniform  algorithm  lor  [aoving  the  cor- 
reciiioss  of  an  irf.'ii;-ary  firogram  can  be  snown  to  be  im- 
possitile,  being  (.-ss-entially  equivalent  to  'Turing's  halting 
problem;  (ii)  even  for  .'-owaT'le  sub-ci.i;,res  (.f  the  correct¬ 
ness-proving  problem,  the  usual  melhod  (some  improvement 
on  llerbrand  search)  is  so  time  consuming  as  to  be  imprac¬ 
tical;  (iii)  there  is  always  a  possibility  oi  error  in  the 
proving  program,  or  in  a[)plvmq  it,  the  procjram  being 
tested. 

(.'))  Testing  Computer  Programs:  In  view  of  the  tlilficulties  ot 
vaTTcJatihg  a  computer  program  by  ptognamming  techniques  or 
formal  proof  methods,  it  is  beli<\,-.!  that  somc'  imount  of 
'I'Sting  will  always  be  necessar  ,  The  purt/)'-,e  .<1  HMs 
re[H;rf  is  to  describe  a  model  whiio  shows  the  I'elalionshii': 
betw-en  errors  of  different  type.,  and  the  prooability  that 
they  Will  cause  a  program  to  fail,  ,ind  .ilso  to  suggest  ot)- 
timum  testing  methods  which  minimi/i  the  laobalality  of 
[uajgr.iin  failure. 

Throughiiut  this  report  it  is  assumed  lb  it  a  iwmf)u!ei  procjram  Ciin  be 
tested  and  that  it  will  either  pass  or  luil  the  lest  11  is  ,ilso  assumed  that 
a  tester  and  a  user  will  inler'pret  f.iilure  ol  the  m  .qr.im  m  (  x.ic  tly  the  s.amc 
w.iy.  For  example,  tailure  might  mean  one  ol  th<  loi;.,wing; 

ii  Ihe  [)f^’'’gt'am  "bomtis"  completely. 

i'  ,  'ill  ,iTi  V !  Cl  a  .s  I  v  \\r,ifi()  ari.sv\(a  is  itjive'. 

I  'c.mbei  1.  iiiic,  uiate  in  Ibe  li  .r  '  i.  i  ,i.i  viri,' 


iv)  the  numbers  are  correct,  but  the  format  is  wrong. 

v)  the  program  works  perfectly,  but  a  side  effect  causes  failure 
in  another  program. 

It  must  also  be  decided  whether  the  program  alone  is  being  tested,  or 
whether  the  program  and  algorithm  is  being  tested.  This  report  is  directed 
mainly  to  the  latter,  but  the  results  can  be  suitably  interpreted  so  as  to 
apply  to  the  former. 

The  extent  to  which  a  computer  program  (possibly  including  the  under¬ 
lying  algorithm)  can  be  tested  varies  from  application  to  application  and  is 
usually  not  a  yes  -  no  situation.  Whether  or  not  a  statistical  model  applies 
to  a  particular  case  depends  very  strongly  on  the  tester's  knowledge  of  just 
what  answer  should  be  produced  by  the  program.  Some  of  the  many  possi¬ 
bilities  of  the  user's  prior  knowledge  of  the  correct  answer  are: 

1.  The  exact  answer  is  known  beforehand.  An  example  would 
be  a  math  package  for  a  new  computer.  Accurate  tables  of 
cos,  arctan,  log,  etc.  have  beer  available  for  many  years. 

2.  A  proposed  answer  can  be  checked  to  see  if  it  is  a  correct 
answer,  but  the  correct  answer  is  not  known  beforehand. 
Programs  which  calculate  the  roots  of  transcendental  equa¬ 
tions  are  examples  of  this  category. 

3.  Answers  for  certain  special  input  values  are  known  before¬ 
hand.  This  is  very  common.  For  example,  if  a  program  is 
supposed  to  output  the  capacitance  of  an  arbitrary  two 
conductor  transmission  line,  it  would  be  natural  to  test  it  for 
coaxial  cylinders. 

4.  The  answer  is  not  known  beforehand,  nor  can  it  be  accur¬ 
ately  checked  for  any  combination  of  input  values.  How¬ 
ever,  certain  consistency  relations  among  different  outputs 
are  known.  For  example,  it  may  be  obvious  that  the  pro¬ 
gram  should  generate  an  output  which  is  a  monotonically 
increasing  function  of  the  input.  A  test  which  detects  a 
decrease  indicates  an  incorrect  program,  but  no  amount  of 
testing  can  indicate  a  correct  program. 

5.  Absolutely  nothing  is  known  about  the  answers  which  should 
be  produced.  This  is  probably  very  rare.  Even  here,  the 
theory  presented  in  this  report  applies  if  the  program 
"bombs,"  i.e.,  a  fatal  or  non-fatal  run-time  error  message  is 
produced. 

The  identification  of  an  error  is  often  not  unique.  This  is  illustrated 
by  the  fact  that  error  messages  from  a  compiler  or  run-time  monitor  can 
direct  the  user  away  from  what  he  considers  as  the  error.  For  example,  if 
the  compiler  says  "line  15,  operator  missing"  the  error  might  really  be  "line 


-2- 


I  I,  si  I'lineni  letminalor  missiny."  in  many  cuses  t,ho  syntax  can  lx-  con- 
rf.  tcd  in  sc'\t:i'al  places  lo  yet  the  proyram  !hru  tfie  compile  staye,  but  oni' 
()!atc  usually  contains  the  reported  error,  and  another  place  is  usually 
where  the  error  should  be  coi'rect(-d.  Similar  comments  can  be  made  about 
sem<mtic  errors  detected  at  run  time.  In  some  cases  an  error  miyht  equally 
well  be  corrected  in  one  ot  several  places.  For  example,  a  common  PL  1 
(■rror  might  be  corrected  by  either  declaring  K  to  be  a  FLOAT  number,  py 
using  XK  instead,  or  by  lorciny  a  necessary  type  conversion.  This  type  ot 
ambiguity  in  defining  the  error  is  not  believed  to  cause  any  difficulties  witn 
the  lormulas  developed  in  this  report,  provided  the  parameters  are  in¬ 
terpreted  correctly. 

2.0  Definitions  ari_d  Objectives 

2,1  DefiniTions  Some  basic  aspects  of  the  ttcdiny  [M'ocess  apply  equal- 
iy  well  LO  a  computt-r  ptoyram  or  to  a  physical  device-.  For  this  ri-ason  the 
program  or  devict-  being  tested  will  sometimes  be  i.:alled  the  module.  After 
the  mu'l-'le  has  bei-n  cnnstructi  d ,  it  is  checked  by  a  testei  anii  then  (‘m- 
ployeo  ■  .i  user.  i  tv,  piobabiiiiy  of  erroi',  i\,  which  occurs  duiung  use 
IS  given  by 


where  P^.^  :s  ih.e  probability  llvil  the  tester  misses  all  of  the  residual  bugs 

in  the  moduie,  uniJ  P^,  is  the  fa'ol.abilify  that  the  user  fhen  encounters  one 

of  file  overlooked  liugs.  If  exhiur-wiv.  testing  is  possible  then  P^.^  =  0, 

since  It  is  assumetl  that  it  a  bug  is  lound  it  then  is  corrected  and  the- 
whole  pr-ocess  is  started  again,  in  most  cases  of  interest  exhaustive  testing 
is  not  possible  or  practical.  If  there  <ire  no  bugs  then  aH  of  ihe  probabili¬ 
ties  are  xero  and  this  possibility  appears  as  a  s[)ecial  case  in  the  analysis 
to  follow.  Some  alternate  definitions  of  "probability  ol  ctuor"  are  given  in 
Section 


'  '  due  tivi's  Petore  presunling  ilv-  F  t  iik-d  medLemc.  j  f'u.-;u|is, 

ihe  or jer lives  of  this  work  will  be  statni  m  ii  e  (>xt)luilly.  Loughly,  Iht- 
aims  aiv  to  provide  an  estim.ite  of  th(-  number  of  tests  required  to  s.iy 

IhnI  the  firogram  being  tested  is  correct  with  a  given  probakuiity ,  and  to 

provide  a  strategy  so  that  a  given  number  (.■!  Ie^ds  .n  ■  ch(  sen  most  ettec- 
I ! V( !y  . 

the  tirsl  objective  is  met  by  di-riviiK!  a  luiwi-on.il  1 1  i ai  on.  hif-  between 
iho  number  ol  tests  and  llie  probability  ol  ei  t'oi  'a  to  i.i'  ■e\-t'i,i|  pos- 
. 'bililies  tor  defining  probability  ot  errors,  ihe  -ci  use-.i  trie  might  Ije 
V  liled  tlie  "’arobability  ot  emljarras.anent."  tor  he  i^  .w;.  ,  U'eiur  is  not 

.■mb,i;  ,  I  ;  .  d  it 


he  discovers  a  bug  and  returns  the  program  to  the  writer 
he  approves  the  program  in  spile  of  its  having  one  or  more 
bugs,  but  the  user  does  not  encounter  one  of  the  remaining 
bugs.  The  tester  [s  embarrassed  if 

he  approves  the  program  and  Uien  the  user  encounters  an 
undetected  bug.  If  the  model  is  d(\signed  appropriately ,  the 
probability  of  error  decreases  as  the  number  of  tests  is  in¬ 
creased  according  to  Eg.  30  below. 


I’he  second  objective  is  met  by  choosing  those  tests  which  minimize  the 
expression  derived  for  probability  of  error  (defined  as  above)  while  keeping 
the  number  of  tests  fixed.  Specifically,  the  principal  question  which  must 
be  answered  here  is:  should  testing  be  concentrated  on  input  data  combi¬ 
nations  which  are  most  likely  to  be  chosen  by  the  user,  or  input  data 
combinations  with  a  largo  a  priori  probability  of  causing  program  failure? 
The  answer  is  that  tests  should  be  applied  to  both  of  these  combinations  of 
input  data  in  a  ratio  which  can  be  calculated  from  Eg.  33  below. 

Actually,  several  simpler  models  are  also  analyzed  before  thai  ijes- 
cribed  by  Eqs.  30  and  33.  All  of  these  expi'essions  contain  many  [para¬ 
meters,  and  curves  are  plotted  tor  selected  values.  No  real  data  were  used 
tor  the  parameters,  but  this  should  certainly  be  done  in  the  future. 

3.0  Models 


3^J  Elem^entary  Mode]  A  simple  case  might  be  the  following:  The 
module  has"N  po'ssibTe  input  values,  and  each  of  these  are  equally  likely  lo 
be  chosen  by  the  tester  or  by  the  user.  Of  these  input  values,  W  cause 
improper  functioning  of  the  module,  but  neither  the  tester  nor  the  user 
knows  which  inputs  cause  errors  or  even  how  large  \V  is.  The  tester 
chooses  t  inputs  at  random  without  keeping  a  record  ot  inputs  previously 
tested,  i.e.,  sampling  with  replacemtmt.  Under  these  circumstances  P  = 


improper  functioning  of  the 
knows  which  inputs  cause 


Under  these  circumstances  P^  = 


=  (1  -  W/N)' 


N  ‘  N 


-WL/N 


A  plot  of 


VV  is  displayed  in 


1.  If  the  testing  is  to  do  any 


to  reduce  significantly  less  than  W/N,  then  it  is  necessary 


In  many  applications  it  is  found  that  satisfying  the  inequality  of  Eq.  (3) 
requires  a  very  large  number  of  tests  t,  and  that  our  intuitive  feeling  is 
that  1\  is  acceptably  small  in  spite  of  testing  using  far  fewer  tests  than 


I'.RROR  PROBABlLriY 


W 


i\ UMBER  or 

VALUES 

NUMBER  OF  IMAiT 
VALUES  CAUSINO 
MALFULCTIONS 

NUMBER  OF  LNPU  ES 
I'ESTEU 

WORST  VALUE  01  U 


EIOUKL  1.  PLOT  OE  P^,  VS.  W  FROM  THE  MODEL  OF  EQUATION  2. 


It  will  continue  to  b('  .issumod  that  the  te.ster  passes  no  information  to 
the  user  about  which  inputs  wer-e  tested.  It  will  also  be  assumed  that  each 
Lest  either  succeeds  or  tails  and  that  there  is  no  .iddilional  intormation 
which  would  permit  sequential  sainpluiq  methods.  Samf)linq  without  replace¬ 
ment  would  slightly  lower  P  to 


(NMVJUNit)! 

('N-w-t f:'  N! ■ 


t  <  N-W 


t  >  N-W 


(A) 


Note  that  this  method  requires  that  the  tv  i.a  ktep  a  rcaa^  i  uf  in[)uts 
jlre.idy  te^'ted,  or  that  he  avoids  duplication  by  oiIkt'  mi'ans. 

The  particular  type  of  error  to  which  P^;  pertains  must  be  borne  in 

mind  to  avoid  confusion  with  other  possilcie  del init ions  of  error,  'f  W=N  all 
inputs  to  the  module  cause  malfunctions  but  P^  =  t'  according  to  Equations 

'2)  and  du  This  is  true  because  the  tester-  remnves  user  buys  with  each 
test,  and  continues  until  all  tests  are  exhausted.  In  thi^:  ca.^e  the  testc'i' 
.ilw.iys  rejects  the  module  ( provided  only  that  t  ■  O)  and  sv  the  user  c  mnor 
experience  a  malfunction.  Note  that  P^_  is  nc'ilhc-t'  I  In-  |)robabilily  Jusei'  has 

a  malPunc'Lonj  noi'  the  proEmbility  luser  has  .i  m'-uniiujn  j  !■ 'uu'  acce|)ts 
modiih^  Rather,  P^^  is  the  probability  Uis.a  *- i  i  malli.'e  -M  i  and  tester 

ae.a  dulei.  it  the  number  ol  .v-Hs  is  ,,  ,in.i  It  b  tiny  eath 

■t  ■  :  l  is  done,  them  the  v.iluc  .a  'a  .-.L:,-  '  i'  ■'  i 


(5) 


I'rom  Cq.  (5)  or  Fig.  1,  it  can  be  seen  that  as  W  -  0  then  P  0  also. 
This  is  so  because  for  small  W  there  is  a  small  chance  that  the  user  will 

encounter  a  faulty  input.  Similarly  as  W  -  N  then  P.  -0  also  because 

there  is  ^ 

"  little  chance  that  the  tester  will  accept  the  module.  Of  course,  the 

last  case  is  undesirable  for  reasons  other  than  the  value  ot  P^,  i.e.,  the 

user  has  a  small  probability  of  receiving  a  released  program. 

3.^.  Model  with  Unequal  Probabilities  3’he  model  described  in  the 
preceding  section  is’ too  simple"  to  apply  to  most  practical  testing  situations: 
some  inputs  are  more  likely  to  fail  than  others,  and  the  probability  of  one 
input  failing  may  not  be  independent  of  another  input  tailing.  Often  a 
single  bug  may  cause  many  inputs  lo  fail.  1’he  tester  may  not  choose  the 
inputs  to  be  tested  randomly,  but  I'ather  in  such  a  way  as  to  utili>',(’  his 
knowledge  of  the  prior  failure  probabilities.  The  user  may  not  t^e  tree  to 
choose  more  reliable  inputs;  in  fact,  he  may  be  constrained  by  the  problem 
to  use  less  reliable  inputs.  The  model  to  be  described  het'e  includes  three 
events,  the  last  two  being  independent  of  each  other  but  dependent,  on  the 
first. 

(1 )  A  programmer  constructs  a  module  which  has  an  error  pat¬ 

tern  iv  with  probability  P(a).  a  might  be  a  binary  vector 
(a.|  ,  Uj^,)  with  ffj  =  1  meaning  input  i  fails  and  Uj  =  0 

meaning  j  tuiKqjons  correctly. 

(2)  A  tester  tries  certain  inputs  to  the  module  and  accepts  the 
module  with  probability  R(accept  |  a).  The  tester  passes  no 
information  concerning  which  inputs  were  tested  to  the  user. 

(3)  A  user  selects  one  of  the  inputs  and  the  muduU;  lails  with 
probability  Q(fail  |  a).  The  error  probability  defined  pre¬ 
viously  is  now  given  by 

F  =  1  P((y)  Q(fail  I  o)  R(accept  |  a)  ((>) 

ar.A 


where  A  is  the  set  of  all  possible  error  p.T.iia.s. 

IV ; 

I'o  illustrate,  if  there  are  N  input  value.-:  then  A  consists  ol  2^  ele¬ 
ments.  Assume  Pfu)  is  0  for  all  A  except  lor  thi'  first  wv 

cv  =  (l,l . 1,  0,0 _ 0)  (b;- 

W  >  •  N-  W  > 
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and  lei  the  pi'obahility  of  the  user  selecting  input  i  be  q-.  Then 


Q(fail  1  ^  q. 

^  i=l  ‘ 

Assume  the  tester  selects  his  inputs  randomly,  choosing  input  i  with  proba¬ 
bility  rj.*  Then 


R(accept  |  a,,)  =  11  (.1  -  r.) 


W  W 

‘'e  =  '  -  'V 

1=1  1=1  ' 


If  -  1/N  and  r  =  t/N  this  reduces  (approximately)  to  the  elementary 
model  given  above: 

(«) 

Another,  ^moro  useful,  form  for  P(u)  than  bq.  (6a)  is  obtained  by  assuming 
that  the  i  ”  input  malfunctions  with  probability  P|.  Then 

N  1-Uj 

P(a)  =  n  P:‘  (1  -  p.)  ' 

i=l  '  ‘ 


(^(fail  I  a)  =  1  ff.q. 

j=l  '  ’ 


K(a  I  a)  =  [l  (1  -  r-) 

1=]  J 


'the  formula  also  applies  if  rr  is  given  the  interpretation  "input  1  is  tested 

and  the  response  is  noted  to  be  wrong  by  the  fesK'r."  Specifically  ir  =  0 

might  mean  either  that  input  i  was  not  tested,  or  that  it  was  tested  and 
an  error  was  not  detected. 


The  two  products  can  be  combined  in  evaluating  : 


N  N 

P  -  I  1  ...  i  1  UiPj  II  IMu  ) 

"2  “n 

where 

E  j(a.)  =  pj"'  (1-Pi)^’‘^i  (l-rj)“i 
This  reduces  to 


N  N 

P  =  ri  (l-p.r.)  I  qj 
^  j=l  '  '  i=l  ' 


l-Pir^ 


C9) 


If  the  tester  selects  t  inputs  determinir.ticaily ,  .ind  it  the  inputs  are  per¬ 
muted  so  that  these  occur  first,  then 


t  N 

P  =  !l  (1-p.)  2  p.q.  (10) 

"  j=l  '  i=t+l  ‘  ' 


An  optimum  testing  strategy  is  obtained  if  the  inputs  are  permuted  so  that 
the  expression  is  minimized.  The  first  factor  suggests  testing  inputs  with 
the  largest  Pj,  but  the  second  factor  suggests  testing  inputs  with  the 

largest  Pjq|.  Let  the  above  expression  be  abbreviated  as  P^  =  Pj^Pjj  and 

note  that  P^^  is  the  probability  of  the  user  getting  an  error  on  an  untested 

input.  Consider  the  effect  of  adding  one  more  input  to  the  test. 

I.et 


(Pk^^u^Pk 

p;  '  Hp  (1-  -V---  -  > 


Ihus,  the  criterion  is  to  select  the  input  with  the  largest 


(11, 

a  weighted  compromise  bet'vc/r:  selection  "o 

Model  with  Statistical  IJ^ependonce  Computer  programs  usually 
fail  tor  a  whole  set  of  input  values  as  a  result  ot  a  single  bug.  1!  is  more 


“Jk  *  ''ypk 

As  can  be  seen,  this  provides 
the  basis  of  p.  or  quPi.  alone. 


r 


realistic  to  assume  that  failures  due  to  different  bugs  are  statistically 
independent  rather  than  failures  for  different  input  values.  For  example,  a 
single  oversight  might  cause  a  square  root  program  to  fail  for  all  negative 
numbers.  Let 


1  if  the  j 


■  th 


bug  causes  failure  for  input  i 


0  if  the  j 


;th 


bug  doesn't  affect  input  i 


A  ^ 

If  Pj  is  the  probability  nt  the  j  bug,  if  M  is  the  number  of  possible  bugs, 
aiid  if  (j  =  1,2,...,M)  is  the  pattern  of  actual  bugs,  then  (note  defini¬ 
tions  of  P,  C),  R  in  the  probability  of  event  statements  (1),  (2),  and  (3)  of 
Section  3.2) 

P  =  I  P(0)  Q(f  I  0)  R(u  I  ^')  (12) 

(■  Q 

This  is  analogous  to  the  corresponding  formula  in  a  given  above.  Here, 

M  6.  1-0. 

P(0)  =  il  p,  ‘  (1-p.)  ‘ 

i  =  l  ‘  ' 

M 

Q(f  16)=!  q.ll-  n  (l-ai,.0..  )|  (13) 

1=1  ‘  k=l  ^ 


R(a  I  0)  =  n  n  (l-r„) 
i=l  £=1  ^ 


Combining  and  rearranging  gives 

N  M 

P  =  2  q.  2  A.(0,,  0^,...,  0..)  n  FdOj) 

where 


1  ft  ^ 


and 


M 

.  1 

The  summations  over  6.  are  for  only  two  values  (0,1)  and  can  be  carried 
out  to  give 

M  N  ‘^iZi  N  M  N  °£i 

p  =n  [i-p.+p.  n  (1-r  )  ]-i  q.  n  (i-p.+(i-Oji)p:  n  (i-r.)  ]  (14) 

e  i=i  1  pi  Ji=i  ‘  n  i£=i 


If  deterministic  testing  is  used  (r=0,l)  over  the  first  t  inputs: 

M  N  M 

p  =  rr  (i-p.)  i  q.  [1-  n-  (i-a..3^)j 
^  i=l  '  i--!  ‘ 


Where  IV  refers  to  a  product  over  terms  involving  a  value  of  i  such  that 
a..  =  1  for  at  least  on'  value  of  j  in  the  range  i  ■  l,2,...,t,  and  where  fl" 

refers  to  the  other  values  of  i.  The  bugs  can  be  permuted  so  that  1  1  i  < 
T  implies  Ojj  =  I  for  at  least  one  value  of  j  from  1  to  t  and  t  <  i  <  M  implies 

Ojj  =  0  for  all  values  of  j  from  1  to  1.  Then 

M  T  MM 

O'  =  n  and  ri"  =  n 

i=l  i=l  i=l  i=T+l 

The  problem  is  then  to  minimize 

T  N  M 

P  =  n  (l-p.)  I  q.  [1-  n  (l-a..p.)  (15) 

^  i=l  *  j=t+l  '  i=T+l  ' 


A  graphical  interpretation  of  the  above  is  portrayed  in  Fig.  2,  and 
illustrates  how  different  inputs  excite  various  bugs.  For  example,  input  2 
excites  bugs  numbered  6,7,8  and  9.  Bug  numbered  8  can  only  be  dis¬ 
covered  through  the  application  of  inputs  2  or  4. 

A  clearer  picture  of  how  P^  depends  on  the  amount  of  testing,  t,  can 

be  obtained  by  deriving  an  upper  bound  from  Eq.  (15).  For  most  cases  of 
interest  this  upper  bound  will  be  tight,  i.e.,  a  good  approximation.  Multi¬ 
ply  thru  by  the  first  product  in  Eq.  (15): 
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(16; 


N  T  M 

Pe  =  I  J 

j=t+l  '  1=1  1=1  ' 


where 


IT-.  T  <  i  <  M 

|i 


Now  observe  the  fQlkA\iny: 

M  M 

-  n  (l-\:|r!:)  -  II  (l-Pj)  since  V;;  1  1 

i=l  ‘  1=1  * 


1  q.  -  Qit)  <  1  (note  Q(t)  =  1  -  -  ~  1  if  Q;  =  ^  ) 
i=t  +  l  ^  '  iM  j 


T 

-  I 

T  i  =  l 

n  (I-Pi)  <  e 

i=l  ‘ 


Pi 


Combining  these: 


T 

-  I 
i=l 


P.  <  P.  = 

e  -  e 


-  R. 


where 


R 


0 


M 

n  (i-pj) 

i=l  ‘ 


M 

- 

~  e 


(17) 


From  the  way  the  Ojj  matrix  v\'as  permuted,  it  can  be  seen  that  t  M  (mono 
tonically),  and  from  Eq.  (16)  these,  in  turn,  imply  ^  0  (monotonically ) 
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It  Eq.  (17)  is  a  good  approximation  to  Eq.  (16),  then  the  rale  at 
which  Pp  0  can  be  seen  to  he  exponential. 

3. -4  El jsirati\'.-  Special  Case  A  simple  special  ca.se  will  illu.strate 
I  :u-  above  fornuiias,  and  Is  useful  in  getting  a  rough  idea  of  the  nuinlier  of 
s*sts  and  the  probability  of  error.  Suppose  that  each  of  the  M  buqs  occurs 
with  the  same  probability  of  error,  ft,  and  affects  the  same  number  of  input 
vaku  ,  b.  Suppose  further,  that  the  pattern  of  input  errors  is  the  most 
uitticull  '.o  detect  with  the  given  number  of  tests,  t,  i.e  ,  'hat  thr  error 
suhse'->  are  as  ’’disjoint"  as  po.ssible.  It  the  lests  are  distribuied  most 
eturuLiveiy  ovei  the  inputs,  each  test  will  deteci  Mb/ .k  bugs.  'I'hr-  u,sLir,g 
will  result  ir.  M  -  VIht,  M  an J(-Tert,>d  hugs,  and  r  =  .Mbt/N  (assumiiui 
bt  s  yt.  Thus 

hViLt 

_  ^  "  "y  ■"  -kM 


wt'  ^  ‘  ich  tr.,s  i.ua;'  :  0.  oi  ..  u'  ....  oHuu  "omp^  ..d 

■' 't*'  r:.)  ',/<  nvi  k-  compar.-iSi.  ir  . 

more  general  assumpticin"  concerning  thr^  paramo r.r rc ,  a:.'  witn  a  :  • 

choice  of  tests,  i  might  be  a  much  smallei  traction  of  . 

3- a  Optimum  Testing  Strategy  (in  Section  3.3)  For  a  given  l,  ihai 
ov.ioetion  oT  inputs  to~5e  tested  wnich  minimizes  will  be  defined  as  the 

optimum  testing  strategy.  From  Eg.  (15),  and  intuition,  one  might  be 

templed  to  test  the  t  inputs  most  likely  to  be  chosen  by  the  ii.-'er,  ’  lUi 

minimizing  the  factor  Iq.  However,  for  most  cases  of  interest,  this  igi- 

prnach  would  be  futile.  If  the  q  are  even  very  roughly  ,  . 

j  ,  i  'd  y  equal,  the  numbei 

•1  possible  inputs  N  is  sc.  huge  th.it  it  w’ould  be  impossible  to  test  enouof: 
.1  tnem  to  reduce 

N 


:  gnificantly  belov\  unity.  I'.ssentiallv ,  the  only  influence  th.t  tir.li'.'j 
■-Irat-'QV  thr,!  the  fact  that  the  square  bracketted  term  in  F.q .  (15)  dciiendi. 

'  ,r'.  i  _ 


The  first  term  in  Fq.  (15)  is  the  most  important  for  values  of  'lie 
: ’araineters  of  most  interest,  and  as  was  shown  above,  it  is  minimized  v<>P' 
;  oximately )  by  choosing  test  values  to  maximize 


11  t!u  r-  ,it\-  (  qual,  1^^^  is  iht.n  minimi/ecJ  (approximalely )  by  maximizing  t. 

'I'his  is  essentially  the  covering  problem  of  switching  theory  and  operations 
II  M 

research.'  A  set  of  inputs  i  s  1  is  said  to  cover  a  set  of  bugs  j  c  J  if 

(jji  =  I  for  every  j  in  J  for  at  least  one  i  in  1 .  Referring  to  Fig.  2,  input 

4  covers  bugs  3,  4,  5,  6,  7  and  8;  and  input  pair  1,4  covers  bugs  3,  4, 
5,  6,  7,  8,  10,  11,  12  and  13.  These  inputs  are  optimum  in  that  they 
cover  the  mosi  bugs  possible.  7’hus,  using  these  optimum  choices,  t  =  6 
and  10  for  !  -  1  uud  2.  However,  the  optimum  set  of  3  inputs  is  not 
obtained  by  ddding  that  input  which  would  cover  the  most  additional  bugs. 
Inputs  1,  3,  4  cover  only  12  bugs,  but  inputs  1,  2,  3  cover  13  bugs.  The 
covering  pioblem  is  usually  stated  as  minimizing  the  number  of  tests  re¬ 
quired  to  covet  ull  bugs  (in  the  present  terminology),  and  no  general 
algorithm  tor  its  solution  is  known.  The  process  described  above,  repeat¬ 
edly  adding  that  test  which  causes  the  greatest  increment  in  bugs  covered, 
is  usually  referred  to  as  the  "heaviest-first  algorithm." 

A  sketch  of  T  vs.  t  for  the  present  example  is  shown  in  Fig.  3. 

Note  that  the  number  of  covered  bugs  added  at  each  step  is  a  de¬ 
creasing  function  of  t.  If  t  is  very  large,  the  bugs  being  covered  might 
be  those  affecting  only  a  single  input,  such  as  divide-by-zero  bugs. 

The  rate  of  increase  of  with  increasing  t  is  certainly  of  great  inter¬ 
est,  and  one  simple  functional  dependence  can  be  given  using  the  above 
ideas.  Define  tp  those  values  corresponding  to  t=l,2,...  when 

optimum  or  near-optimum  testing  strategy  is  used.  Eg.  (19a)  can  be  re¬ 
written 

tf  t 

E,  =  r  p-  ^  1  v20) 

^  i=l  '  k-1 

where 


=  I 


Pi’ 


Tq  =  0 


From  Eg.  (15)  with  the  second  and  third  terms  approximated  by  unity,  or 
from  Eg.  (17)  with  approximated  by  zero: 


1 


PpU)  ' 


(21) 


Now  if  IS  assumed  Lo  have  a  form  which  is  a  decreasing  function  of  k 

and  which  can  be  liniiely  summed  in  closed  form,  a  convenient  relation  can 
be  found,  i'or  example,  if  =  c/k 


1 

1  -  0.s772c  +  "tint 

k=l 


This  shows  what  might  be  expected  in  practice,  a  gradual  decrease  of 

over  many  dicades  as  I  is  increased  over  several  decades.  When  some 
relations  are  plotted,  eg.  F.q.  (2)  or  Eg.  (18),  they  exhibit  very  sudden 
and  deep  drops  in  P^_  when  certain  values  of  t  are  approached,  and  this  is 

not  the  behavior  to  be  expected  except  in  the  simplest  programs.  Of 
course.  Eg.  (22)  does  not  have  enough  parameters,  but  similar  formulas  can 
be  found  which  do  have  enough  parameters. 

3.6 _ InterpretaBon  of  a  Program  Bug  There  are  many  possible  ways 

to  interpret’a  bug . '  '  . . 

1.  Formally,  a  bug  is  simply  a  subset  of  input  values  which  fail 

together  when  the  program  is  used,  e.g.,  the  region  B  -  4  AC  <  0  in 

solving  a  quadratic. 

2.  A  bug  might  be  a  careless  typing  error,  such  as  A+B  instead  of 

A-B. 

3.  A  bug  might  be  an  incorrect  statement  or  a  defective  subroutine. 

4.  A  more  flexible  definition  of  bug  should  allow  such  things  as 
omitting  a  check  for  dividing  by  zero.  These  bugs  of  omission  arc  harder 
to  handle  with  fixed  M  and  matrix  Ojj. 

5.  A  bug  might  be  defined  as  a  distinct  path  thru  a  flowchart.  A 

popular  testing  procedure  is  to  run  at  least  once  thru  every  such  path, 

and  if  these  are  the  only  bugs  included,  this  constitutes  a  complete  cover 
and  Pg  =  0.  Unfortunately,  the  model  would  then  not  be  very  realistic 

because  P^  would  never  actually  vanish  except  in  the  most  trivial  programs. 


-Ifi- 


i 


I 


6.  If  programming  can  be  identified  with  decisions,  a  bug  is  a  wrong 
decision. 

4^0 _ Random  Testing 


4^1  Random  Test  Models  Some  of  the  difficulties  involved  in  relating 
the  number  of  bugs  discovered  and  the  number  of  tests  made  can  be  avoid¬ 
ed  if  the  tests  are  chosen  randomly  rather  than  deterministically.  Optimum 
strategies  can  still  be  found  if  the  probabilities  of  choosing  the  inputs  to  be 
tested  are  not  equal.  The  probability  of  error  for  random  testing  has 

already  been  found  in  Eg.  (14)  above.  Let  s^  be  the  probability  that  the 

tester  chooses  the  £'th  input  at  any  particular  test.  If  the  number  of  tests 
is  t,  then  the  probability  that  the  £'th  input  is  tested  during  the  series  of 
t  tests  is 

=  1  -  (1  -  spt  (23) 

Note  that  the  sum  to  unity,  but  that  each  r^  can  range  from  £ero  to 
one.  It  is  helpful  to  define 


N 

n 

£=1 


(l-Sg) 


2i 


(24) 


which  is  the  approximate  probability  of  a  single  test  missing  the  i’th  bug. 
Note  that  Aj  is  certainly  no  greater  than  unity.  F.q.  (14)  can  now  be 

rewritten  in  terms  of  s,  A  and  t. 

M  .  N  M  o.p.Aj^ 

PpCt)  =  n  (l-p.+p,Ai^)  I  q.  (1-  Tl  (1  -  )]  (25) 

i=l  '  '  ^  Fl  ^  i=l 

As  the  number  of  tests  is  increased  the  probability  of  error  approaches 
zero 


Lim  P  (t)  =  0 
t^  ® 

If  no  tests  are  made 

N  M 

P  (0)  =  1  -  2  q.  n  (l-o..p.)  (26) 

c  ]=!  1  )i  I 


It  is  easier  to  see  the  importance  of  different  terms  in  Eq.  (25)  by  examin¬ 
ing  the  asymptotic  form  for  large  t: 
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(27) 


where 

Nj^  is  the  number  of  bugs  with  a  probability  of  discovery 
(These  bugs  which  have  probility  Qj^  will  be  called  a  "block.".) 

and 

is  the  average  probability  of  bugs  in  the  k'th  blo(±  being 
introduced  by  the  program  writer. 

No  loss  of  generality  is  obtained  if  the  Q^.  are  arranged  in  order  of 

decreasing  magnitude.  Eg.  (30)  clearly  represents  a  decreasing  function  of 
t,  but  it  is  not  so  apparent  that  the  decrease  is  more  than  1/t  than  fxi)o- 
nential  over  the  range  of  interest.  To  see  this,  note  that  each  term  in 
Eq.  (30)  reaches  a  maximum  as  a  function  of  at  Qj^  =  T/t.  For  each 
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I  a  certain  term  in  the  sum  will  be  dominant  and  values  of  k  both  larger 
and  smaller  than  this  dominant  term  will  contribute  relatively  small  amounts. 
This  leads  to  an  approximate  formula 


where 


t 


k 


1 


If  the  product  is  approximately  independent  of  k,  then  will  de¬ 

crease  approximately  as  1/t,  .  The  decrease  is  exponential  below  t-l/Q 

K  min 

but  this  value  could  be  very  large  indeed.  Por  example,  if  the  input  is  a 
32  bit  number,  and  if  a  bug  causes  an  error  for  a  single  input  value,  then 


Such  a  bug  would  be  very  hard  to  detect  by  random  testing,  but  by 
the  same  token  it  would  be  very  unlikely  to  cause  a  user  error  unless  the 
user  favors  that  value  for  some  reason.  The  behavior  of  Eqs.  (30.)  and 
(31)  is  sketched  in  Fig.  4  and  Fig.  5. 


Fig.  5  shows  the  way  in  which  the  probability  of  error  decreases 

with  the  number  of  tests  t.  The  parameters  were  chosen  to  illustrate  the 
following  cases; 


i)  Pg  decreases  slowly  at  first,  t(ien  rapidly 

ii)  Pg  decreases  gradually  throughout  the  range 

iii)  P  decreases  rapidly  with  the  first  few  tests,  then  many 
more  tests  are  required  to  obtain  a  further  decrease. 

As  might  be  expected,  in  case  (i)  above  most  bugs  are  difficult  to 
detect,  in  case  (ii)  bugs  are  evenly  distributed,  and  in  case  (iii)  bugs  are 
either  very  easy  or  very  difficult  to  detect. 


FTogram  bugs  have  been  divided  into  four,  and  in  some  cases  five, 
classes,  indicated  by  k  in  the  following 

is  the  probability  of  detecting  a  class  k  bug 

is  the  probability  that  a  user  has  introduced  a  class  k  bug 


Mj^  is  the  number  of  possible  class  k  bugs. 
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PROBABILITY  OF  ERROR 


The  parcinul  -:  always  0.1,  .01,  .001  and  .0001,  except  where  5 

classes  are  present  in  which  c.aso  Qj^  is  0.7,  0.1,  .01,  .01,  .001,  and 

.0001.  Shown  on  the  curves  is  the  product  tor  each  of  the  four  or 

five  classes.  For  example,  the  curve  marked  1,1, 1,1  applies  to  these  '1 
cas('S : 


Ease  ol 
inputs  a 


i’k 

c> 

i 

c  1 

.  05 

20 

0.1 

.01 

100 

.(d 

.  05 

20 

.01 

.005 

200 

.001 

,  ■ 

20 

.001 

.002 

500 

.Ouf': 

20 

.0001 

.001 

100 

1\ 

^k 

"^k 

i 

.01 

100 

0.1 

.002 

500 

ul 

.001 

1000 

.01 

.01 

100 

00  i 

.001 

1000 

.001 

.01 

100 

e'OOl 

,01 

100 

.0001 

.002 

500 

detection,  as  reflected  in  Q.  ,  depends  largely  on  the  number  of 
bug  effects. 


4.0  Optimum  resting  Strategy  (in  Section  4.1)  Even  though  the 
inputs' to  be"  tested  are' to  "Be  selected  'randomly ,  a'n' optimum  strategy  exists 
in  that  the  probabilities  of  testing  various  inputs  can  be  chosen  so  as  to 
minimize  P,.  The  asymptotic  form  expressed  by  Eq.  (27)  can  be  rewritten 


(1-h,) 


M  pj 
1^1 


■tS: 


QiO 


(32) 


1  he  quantities  (F  and  ()•  are  fixed  by  the  problem,  but  the  S|  can  be  chos¬ 
en  to  minimi/.e  P  ,  al  least  insofar  as  they  depend  on  the  s.  (see  Eq. 
(29)). 

The  optimum  s.  can  easily  be  found  for  the  special  case  where 

Oji  =  6jj,  i.e.,  where  each  bug  affects  only  one  input.  The  problem  is  to 

minimize  the  asymptotic  form  (with  some  additional  approximations): 

N  -s.t 
f  -  I  p^q^e 


while  keeping  constant 


N 

1  =  g  =  I  s. 
i=i  ‘ 

The  method  of  Lagrange  multipliers  gives  the  equation 


0  =  —  +  A  — 2  =  -  p.  q.  te”  ^  ^ 

Bs^  Bs^ 

After  eliminating  A  to  insure  that  the  Sj  sum  to  unity,  this  gives 

Sj.  =  ^  [  in  (cPi^q^)  (33) 

where 

1  ^ 

iin  c  =  -  in  pjqj 

This  solution  shows  that  if  a  small  number  of  tests  are  made,  the  tester 
should  favor  the  inputs  most  likely  to  be  used  and  most  likely  to  be  in 
error,  but  that  if  a  large  number  of  tests  are  to  be  made  the  tester  should 
select  the  inputs  with  an  essentially  uniform  distribution.  Even  where  a 
non-uniform  distribution  of  tests  is  indicated  by  Eq.  (33),  the  tester  should 
distribute  his  tests  more  uniformly  than  the  user  because  of  the  logarithm 
(assuming  the  do  not  deviate  too  markedly  from  uniformity). 

The  optimum  testing  strategy  when  a.^  is  a  delta  functon  (each  bug 

affecting  only  one  input)  can  be  quite  misleading  for  the  more  general  case. 
It  may  be  quite  impossible  to  make  the  Sj  in  Eq.  (32)  either  independent  of 

i,  or  (even  logarithmically)  proportional  to  the  Q^.  Also,  if  one  returns  to 

the  more  general  Eq.  (25)  or  Eq.  (17)  and  attempts  to  choose  the  to 

minimize  or  P„,  it  is  found  that  the  best  value  for  s„  is  either  zero  or 
e  e'  £ 

one  and  that  these  values  depend  on  an  unrealistically  detailed  knowledge  of 

a^j.  This  is  really  the  same  as  the  optimum  deterministic  testing  discussed 

in  Section  3.5. 

5.0.  Alternative  Definitions  of  Error 


The  definition  of  error  which  was  used  above  may  not  be  suitable  for 
all  purposes,  but  related  probabilities  can  easily  be  calculated  from  the 
equations  given.  Define  two  events  as  follows; 

E  is  the  event  that  a  user  employs  the  module  once  and  encounters  a 
failure. 


-23- 


is  llii  cM'nl  ihdl  a  tostcf  ofjcfatos  Iht-  module  t  limes  and  does  nol 

encounter  a  l.nlure,  i.e.,  the  t(;.sU:i  accepts  the  module.  The  subscript  m 
is  mnemonic  tor  "missed,''  sinc(-  iHn)s  are  always  assumed  to  be  prescrnt  with 
some  probability,  and  iharclote  li  l^he  probability  that  the  tester 

has  missed  ii'  buys,  i.e.,  income 'Hy  accepted  the  module. 

i  hta-e  ibllt  renl  (.luantities  which  might  be  interpieli  d  <is  the  piobuial- 
ity  of  user  -  i  tor  are  tabulated  below  (Table  I)  for  the  various  models  which 
have  been  analyzed.  The  effectiveness  of  the  testing  process  can  be 
judged  by  comparing  the  first  with  the  last  two.  Note  that  the  last, 
Pr{E^|E^}  is  alwavs  qi'eater  than  the  second,  hr  {  j  =  P^.  This  tact 

might  lead  to  >  n  ar--  ^.t  iondilion.il  probability,  hrjb^^b^^},  in  some  cases 
wher  e  P^_  is  r., a  a,  .im.:  O;  tester  rejection  and  where  there  is  a  large  proba¬ 
bility  of  module  buy  .. 

One  is  tempted  ii.  regar'd  the  goal  here  as  minimization  of 

regarding  the  avoidance  of  user  error  as  of  most  importance  to  the  user. 
But  remember  this  report  seeks  an  optimum  testing  method,  and  that  this  is 
different  from  and  does  not  preclude  previously  using  an  optimum  program¬ 
ming  method.  The  latter  will  minimize  user  error,  but  will  not  gener^Ty 
remove  the  need  for  testing  in  addition. 

6.0  Conclusions  and  Comments 


It  is  believed  that  defining  what  is  meant  by  the  probability  of  a  pro¬ 
gram  error,  and  presenting  a  model  which  permits  its  exact  calculation 
(Eg.  (15))  will  provide  the  nucleus  around  which  a  theory  of  software  reli¬ 
ability  can  be  built.  The  purpose  is  not  merely  to  get  a  formula  into  which 
numbers  can  be  plugged  to  give  a  probability  of  error  --  this  does  nothing 
to  reduce  errors.  Rather  it  is  anticipated  that  by  classifying  different 
types  of  bugs,  errors  and  test  results,  and  by  showing  how  they  interact 
programming  systems  can  be  improved  and  optimum  testing  procedures  can 
be  found. 

Most  reports  on  program  testing  select  the  test  data  so  as  to  traverse 

each  path  in  the  program  at  least  once.^^'^^  The  approach  described  in 
this  report  might  seem  to  be  directed  at  selecting  test  data  according  to  the 
problem  specification.  A  near-optimum  strategy  may  bo  to  select  test  data 
randomly,  since  this  seems  to  be  a  fairly  efficient  solution  to  the  covering 
problem,  but  to  also  make  sure  some  test  points  are  in  various  distinct  data 
2 

domains  (e.g.,  B  -  4AC  <  0)  and  to  also  make  sure  each  program  path  is 
traversed.  In  addition,  it  is  desirable  to  concentrate  test  data  at  points 
known  to  be  more  likely  to  be  selected  by  the  user.  These  requirements 

are  often  contradictory,  but  optimum  compromises  are  suggested  by 
Eg.  (11),  Section  3.5,  and  Section  4.2  above  (provided  the,  stated  assump- 
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tions  are  met  and  the  required  parameters  are  available).  Also  formulas 
derived  give  some  idea  of  how  many  tests  need  to  be  made  to  achieve  a 
given  reliability. 


If  further  work  is  to  be  done  along  the  lines  of  this  report  it  is  sug¬ 
gested; 

1 )  Data  be  gathered  so  as  to  verify  the  derived  relations  be¬ 
tween  the  number  of  tests  and  the  error  probability,  espec¬ 
ially  as  in  Section  4.0. 

2)  Another  model  can  be  developed  in  which  a  partial  knowledge 
of  the  bug-input  matrix  a  is  assumed.  This  would  lead  both 
to  a  more  easily  applied  optimum  testing  strategy,  and  to  a 
method  of  sequential  testing  in  which  the  last  t(-st  points  are 
chosen  according  to  the  results  of  the  first  test. 
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